Search CORE

125 research outputs found

Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

Author: Nakamura Eita
Nakamura Tomohiko
Sagayama Shigeki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/12/2015
Field of study

This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple extensions of the algorithms previously proposed in the literature are not applicable in these situations for scores of practical length due to the problem of large computational complexity. To cope with this problem, we present two hidden Markov models of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-following algorithms with an assumption that the prior probability distributions of score positions before and after repeats/skips are independent from each other. We confirmed real-time operation of the algorithms with music scores of practical length (around 10000 notes) on a modern laptop and their tracking ability to the input performance within 0.7 s on average after repeats/skips in clarinet performance data. Further improvements and extension for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

arXiv.org e-Print Archive

Crystal-like Symmetric Sensor Arrangements for Blind Decorrelation of Isotropic Wavefield

Author: Nobutaka Ono
Shigeki Sagayama
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref

Dynamic Bayesian networks for symbolic polyphonic pitch modeling

Author: Raczynski Stanislaw,
Sagayama Shigeki
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 27/07/2011
Field of study

National audienceThe performance of many MIR analysis algorithms, most importantly polyphonic pitch transcription, can be improved by introducing musicological knowledge to the estimation process. We have developed a probabilistically rigorous musicological model that takes into account dependencies between consequent musical notes and consequent chords, as well as the dependencies between chords, notes and the observed note saliences. We investigate its modeling potential by measuring and comparing the cross-entropy with symbolic (MIDI) data

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Rhythm Transcription of Polyphonic MIDI Performances Based on a Merged-output HMM for Multiple Voices

Author: Eita Nakamura
Kazuyoshi Yoshii
Shigeki Sagayama
Publication venue
Publication date
Field of study

(Abstract to follow

ZENODO

Robust estimation of directions-of-arrival in diffuse noise based on matrix-space sparsity

Author: Ito Nobutaka
Ono Nobutaka
Sagayama Shigeki
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 28/10/2012
Field of study

We consider the estimation of the Directions-Of-Arrival (DOA) of target signals in diffuse noise. The state-of-the-art MUltiple SIgnal Classification (MUSIC) algorithm necessitates accurate identification of the signal subspace. In diffuse noise, however, it is difficult to identify it directly from the observed spatial covariance matrix. In our approach, we estimate the target spatial covariance matrix, so that we can identify the orthogonal complement of the signal subspace as its null space. We present a unified framework for modeling noise covariance in a matrix space, which generalizes four state-of-the-art diffuse noise models. We propose two alternative algorithms for estimating the target spatial covariance matrix, namely Low-rank Matrix Completion (LMC) and Trace Norm Minimization (TNM). These rely on denoising of the observed spatial covariance matrix via orthogonal projection onto the orthogonal complement of the noise matrix subspace. The missing component lying in the noise matrix subspace is then completed by exploiting the low-rankness of the target spatial covariance matrix. Large-scale experiments with real-world noise show that TNM with a certain noise model outperforms conventional MUSIC based on Generalized EigenValue Decomposition (GEVD) by 5% in terms of the precision averaged over the dataset

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Feature-Dependent Allophone Clustering

Author: Matsuda Shigeki
Nakai Mitsuru
Sagayama Shigeki
Shimodaira Hiroshi
Publication venue: 'International Speech Communication Association'
Publication date: 01/10/2000
Field of study

We propose a novel method for clustering allophones called Feature-Dependent Allophone Clustering (FD-AC) that determines feature-dependent HMM topology automatically. Existing methods for allophone clustering are based on parameter sharing between the allophone models that resemble each other in behaviors of feature vector sequences. However, all the features of the vector sequences may not necessarily have a common allophone clustering structures It is considered that the vector sequences can be better modeled by allocating the optimal allophone clustering structure to each feature. In this paper, we propose Feature-Dependent Successive State Splitting (FD-SSS) as an implementation of FD-AC. In speaker-dependent continuous phoneme recognition experiments, HMMs created by FD-SSS reduced the error rates by about 10% compared with the conventional HMMs that have a common allophone clustering structure for all the features

Edinburgh Research Archive

Asynchronous-Transition HMM

Author: Matsuda Shigeki
Nakai Mitsuru
Sagayama Shigeki
Shimodaira Hiroshi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2000
Field of study

We propose a new class of hidden Markov model (HMM) called asynchronous-transition HMM (AT-HMM). Opposed to conventional HMMs where hidden state transition occurs simultaneously to all features, the new class of HMM allows state transitions asynchronized between individual features to better model asynchronous timings of acoustic feature changes. In this paper, we focus on a particular class of AT-HMM with sequential constraints based on a novel concept of “state tying along time”. To maximize the advantage of the new model, we also introduce a feature-wise state tying technique. Speaker-dependent speech recognition experiments demonstrated error reduction rates more than 30% and 50% in phoneme and isolated word recognitions, respectively, compared with conventional HMM

Edinburgh Research Archive

Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity

Author: Duong Ngoc
Gribonval Rémi
Ono Nobutaka
Sagayama Shigeki
Tachibana Hideyuki
Vincent Emmanuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/05/2011
Field of study

International audienceThis paper considers the blind separation of the harmonic and percussive components of multichannel music signals. We model the contribution of each source to all mixture channels in the time-frequency domain via a spatial covariance matrix, which encodes its spatial characteristics, and a scalar spectral variance, which represents its spectral structure. We then exploit the spatial continuity and the different spectral continuity structures of harmonic and percussive components as prior information to derive maximum a posteriori (MAP) estimates of the parameters using the expectation-maximization (EM) algorithm. Experimental results over professional musical mixtures show the effectiveness of the proposed approach

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Substroke Approach to HMM-based On-line Kanji Handwriting Recognition.

Author: Akira Naoto
Nakai Mitsuru
Sagayama Shigeki
Shimodaira Hiroshi
Publication venue
Publication date: 01/01/2001
Field of study

A new method is proposed for on-line handwriting recognition of Kanji characters. The method employs substroke HMMs as minimum units to constitute Japanese Kanji characters and utilizes the direction of pen motion. The main motivation is to fully utilize the continuous speech recognition algorithm by relating sentence speech to Kanji character, phonemes to substrokes, and grammar to Kanji structure. The proposed system consists input feature analysis, substroke HMMs, a character structure dictionary and a decoder. The present approach has the following advantages over the conventional methods that employ whole character HMMs. 1) Much smaller memory requirement for dictionary and models. 2) Fast recognition by employing efficient substroke network search. 3) Capability of recognizing characters not included in the training data if defined as a sequence of substrokes in the dictionary. 4) Capability of recognizing characters written by various different stroke orders with multiple definitions per one character in the dictionary. 5) Easiness in HMM adaptation to the user with a few sample character data

Edinburgh Research Archive